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dtfficylty based on .a norming sateple* Least-squares estimated of testee ^ 
"ability, which wete 'based solely on the, d'ifficillty pereeptions ofth6 testees, 
correlated significantly with number-correct and Baximuro-likelihood ability 
scores based on the testees' conventional responses to the items. Thfese 
r^esults show that .item-difficulty per^:eptions were highly related to the 
• 'objective'* indices of Item difficulty often used in tesr constructipn, and v 
thatxas testee 'aBility level increased^ the items were perceived' as being ^ 
relatively less diffieultv >The relationship between a teste'e's ability ^nd 
his/her perception of an indlyldu^l itenr^s relative difficulty appeared t<o be 
weak. Of major importance , was the. finding that items which were ^appropriate 
in difficulty leyelsj^from a psychometric standpoint wete perceived by the" 
testees as being too difficult for their ability levflsi The effects on 
testees, of tailoring ^ test such that items are perceived as b^ing uaiformly 
too difficult should be investigated. 
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'Accuracy of Perceived Test-Item Difficulties • 

Conventional ^ility'fest^ require all testees to answer thfe same 's^t of 
test items. Because ^e^ti^es differ inability level', Jiowever, tests of ' this 
kind' may potentially create differential psychological environments for testees 
of different ability levels. A test which is appropriately, difficult for A 
testee' of ave'^x;age ability may be perceived by less ajble individuals as being 
much too difficult, end s\ich perceptioi^ may lead these testees to approach the 
task wiAi a nx iety anci forbearance, pn the (Tther hand, individuals %d.th higher 
than average abi^ties may find%|the task a simple or even pleasant otie. 
Clearly, the psychological environment of a testee may vary gfeatly depending 
on ^the individual's perception of the task. 
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Adaptiye tests are designe^i such that each testee receives, items which are 
psychometrically appropriate for his/her ability level (Lordi 1970;. Weiss, 1974; 
Weiss & Betz, 1973). For example, items in Buch tests may be chosen do that 
each testee, regardless of ;^bility level, will have approitmately a f if ty--percent 
chance of answering the item correctly (e.^.. Lord*, 1970). ^ Ike Captive tj^t . 
may thus reduce the* differential psychological environment ai^sin^frdi'tSe 
administration of a fixed set oi items to persons of differing abdKf levels, 
and may thereby improve the performance of iaw-ability students. In fact, 
under certain conditions, adaptive testing has been shown to be bore abtivating 
for low-ability testees (Betz & W^iss, 1976i) and to result in higher 2*llity. « 
estimates (Betz & Weiss, 1976a)-' 



Holtzman (19^70) points' out the potential importance of psychological factors 
in the estimation of an individual's ability: 

It' may be important to investigate the interaction of personality 
and situational factors with tailor.ed testing. The , motivational impact 
on the student when he discovers that most of the items are at a cejrtain 
level of difficulty (or uncertainty) Is unkriown. The optimal level 
(or mixture of levels) for a given student will not be derived from test 
theory alotfl^ information about student anxiety and motivation may al,so 
be relevant, (p. 199)-* ^ ' 

Whether adaptive tests can actually reduce the differential psychological 
effects due to the administration of an inappropriately easy or difficult set 
of test/i*tems de^nds largely on whether testee^ can accurately perceive the ' 
dif f iculties ^)f the items administered. Little research has dealt directly 
with the Question of item-dif ^eulty perception. ^ , 

Munz and Jacobs (19*71)* asked introductory psychology students to*6cale 
multiple-choice examination questions on the subjective difficulty an Introduc- • 
tory psychology student would experience in reaching a soTution to a particular ' 
test question. Thurstone*s methods of equal-appearing intervals w^s used to 
derive difficulty scale* values for. the individual itemsl These scale values 
correlated positively but moderately (r-.52) with traditional proportion-correct v 
difficulty indices based on the subsequent administration of those It^ms to 



Other Introauctory psychology stvidents. ihwever, Munz and Jacobs made no 



S ■ — — i.i.v|ucni.o. auwcver, minz ana jacoDs made no , 
attempt: to determine the acicurac^ with which individiials perceive^ item diffi- 
culties relative to their own Ifcvels of ability. Further, these results may 
.,b^-gen€*ralized'only to other -achievem^t-testiing situations where students have 
been exposed to the material and 'Have made an attempt to familiarize themselves 
with it**. . . • • • 

■ ■ ' . • ' '.. , ' ^ • 

• Bratfisch, Domi^, and Bor^ (1972) a^ked individuals \o estiAate" the 
subjective difficulty of items ^frpm sets A, dj-D, and E of Raven's Standafd 
'Prpa-resaive Matrices. The items were first "administered conventionally., in the 
o^er. of their "objectl've'' diffiQulty as assessed by determining the "proportion 

• of* correct responses in a noraing sample. Following this, the items were 
presented in random order and estimates of, their subjective difficulties were 
obtained through a magnitude estimation procedure. The Spearman rank-order 

» correlation betw<«en the Subjective difficulties of the Items and the order of 
their initiaj administration (i.e., t^eir ranked "objective" difficulty) was 
positive and high (rg=.90). . Unfortunately,' <the effect of the items' prior 

administration in the order of their objective difficuPty cannot be determined.' 

In another study by the same authors (Bratfisch," Borg & Dornic,. 1972), 
t?es tees were administered numerical-reasonitig, spatial-ability, ,or verbal-* 
.comprehension items' Iri the order of "objective" difficulty of the items io the 
testa. Inniediately after attempting to answer each item in the conventional 
manner, the testees rated the item's difficulty on a nine-point scale where 
/I corresponded to a "v^y,Wery easy/' item and 9 corresponded to a "very, very 
hard" item. The. Sfjearman correlations between, order of administration .and 
perceived difficulty for 'the numerical-reasoning, spatidl-abllity , and verbal- 
compijehension tests were .9^,*.92, arid. .92, 'respectively. Unfortunately, in 
both studies by these authors, the subjective difficulties were not expllcitiy 
related- to the testees' perceptions of 'an item's appropriateness to their 
ability levels. More importantly, in both studies, it is impossible to separate 
the effect of item difficulty fropa that' of order of administration. 

y • . • ' , ■ 

The present study was designed to determine whether or .not testees can * 
perceive the dif f iculties, of ability , tesf, ^tems- relative to their levels of 
ability and, 9t to investigate the accuracy o.f -these perceptions for 
individual items. .Additionally, the study was designed to determine the level 
6f. item difficulty perceived. by testees as being appropriate for their ability. 

^ . - . Method V ^ . ^ 

Test ^Construction ' » 

^ Two Al-item conventional tests were designed which had a large range of 
differences betveen the difficulties of successive items. Items for the tes'ts ' 
were -chosen from a^ pool of f iveraltemative, multiple-choice vocabulary items 
on the basis of their nprmal-ogive difficulty {b) and discriipination (a) 
parameters (Lord & Novick, 1968)- One of the tests was designed to be adminis- 
tered to a group of relatively low-ability college students. The other •test 
was designed to be administered to a^group of B^latively higher ability students. 
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The item parameter esjtimates were based initially on*data reported bj^ 
"MoBride and Weiss (1974), derived from samples of University of Minnesota. ' 
under^raduaties. These parameter estimates were revised using a procedure 
essentially the same as that described by Jensema (1976) .* Appendix A 
describes the process 6f developing the 'reyised item parameters. The difficulty * 
and discrimination parameters for each test item are shown in Appendix Table B-1. 

• ' * 

^ Thejiow- and h^gh-ability tests had'«ia mean difficulty of ^--Z. 190 and 
fc=-.488, ras^ctively. Mean discrimination values for the low- and high-ability 
tests were 5=1^117 and a=1.501, respectively. • * 

Procedure ^ • . , . ' #. 

Subjects . Two groups of undergraduate students participated ip this stutiy. 
The first group consisted of 119 students from psychology classes in the . 
University of Minnesota'^s General College (GC>^who were tested zn the winter 
of 1975. The second group, tested in the spring of 1975, consisted of 185 I 
students from an introductory psythology class in the University's Cpllege of 
Aberal Artar^(CLA) . All students were volunteers who received points toward! ' 
their final course grades for par|:icipation in the experiment. GC students ' 
typically perform m6re poorly on ability and aptitude tests than do CLA 
students; for the purposes of this study, the GC students will therefore be | 
designated as the **low^ability", group while the CLA students will be referred 
to as the "high-ability" group. 

Test administration . All students were tested at individual cathode-ray 
tenhinal^ (CRTs) connected to a Hewlett-Packard 9600E real-time computer system* 
Instructional screlens similar to tljose described by DeWitt and Weiss (1974, 
Rp- 36-53) etplain^d the operatiori of the CRTs before the actual testing was ' 
begun. • In addition, a proctor yas present in the testing room to provide 
assistance in the operation of the equipment. * 

Each student answered 41 multiple-choice vocabulary test items. The " ' 
first six test items presented were identical for testees in d given^ ability 
group. These items, whose difficulties reflected the difficulty range of the 
test, served to familiarize the students with the range of difficulties they 
would subsequently encounter. The remaining 35 items in each test were presented 
in four different orders of administration to minimize the effect that the order 
of item presentation might have on perceived item difficulty. Testees weire 
sequentially assigned to one of. the four conditionSi. Although the same 
procedure was followed in both ability groups, the items differed between 
groups. Appendix Table B-1 shows the- order of item administration In each of 
the four conditions for^ each ability group., ^ *. ^ 

Prior to the administration of the test, the students were informed that 
they would have as piuch .time as they needled to c9mplete £^e task. During the 
test, items were presented on the CRT screen and students responded by typing 
the ntjmber corresponding to the chosen alternative for each f ive-altematiye. 
multiple-choice item. Iimnediate^y after responding to an, item, each studejft* 
was asked to indicate the item's perceived difficulty, by entering a difficulty 
code selected from the following list: 



A. Much too easy fov you 
S. SorrkwTiati too easy for you 
, , I ' , 'C. Just dbdut vighi> for you ^ • ^ » 

' • ■ !• ' Somewhah too hard for you \ 

I I 2 • . i?. Much top hard for j^ou," 

I The testee's response was then cheiked by the computer to ensure that one of 
I the five a^ernatlves had been chosen, and these data were'lStored with the 
item-responpe data for later analy^sls. 

I.' ' ' ' ' 

; The study was designed to' ^.investigate three different aspects of Item- 

dlffi(!ulty '-perception, the initial , phase wag designed to determine whether or 
not testBes could accurately perte'lve the difficulty qf ability-test items.. 
The second phase wal concerned With whether or not a testee's ability level 
was related to the perception of the relative difficulty of a given item; 
• rthat is, }iow accurate an individual'^ perceptions were, relative to his/her 
ability level. The third' phas^ of the analysis attdnpted to determine the 
relative item difficulty whiqh was , perceived by the testee as being about 
right for his/her ability level. ^ ^ 's. 

Accuracy of Dlffdculty Perceptions * > , 

'^-^ tgtHod of Analysis ^ - 

Difficulty perception model ? ^ An individual's perception of an item's 



•difficulty can be thought of as the signed distance between the person's 
ability level, and the item's difficulty level in a Euclidean ability/difficulty 
space. This perception w[ill be denoted by , ' ^ ^ 

p _ - . 

'd. Y. w . (X . -X . ) ' * n 1 

^3 p^j 3P ^P' ^. . 

where d.. is the perceived difficulty pf item j for person i ' • 

^jp dif f icu^ of item J along ability /difficulty dimension p . 

^ x^.p is the ability of person i along a'billty/dif f iculty dimension p 

w . is the weight of item j along dimension p • ' 

' P is the number of dimensions in the ability/diff iculty space^ 

Thus, iti this model, the difficulty of an item for a given persbn is defined 
as the weighted sum of the signed distances between the location of the item 
and the location of -^the person along P ability/diff iculty dimensions. For the 
present analysis, numerical values of d. . were assfgned to each alternative on 



; Appreciation for the development of this model is expressed to Mark Davison,' 

^ Assistant Professor of Educational Psychology. University of Minnesota. 
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the racing scale. The values assigned to alternativlfes ^ through £* were -2, 

-1, 0, +1, and +2, respectively.* Thus, . ;Ln^reased as the perceived difficulty 

t'j / • ' . " ♦ 

of an-item/dncreased, aud d. . was ^qual to zero when an item was pei^ceived by 

a testee'as "just atout' right for [me]." %, 

* • i ' . • 

The use of a model such as that in Equation 1 is advantageous for several 
reasons. Using the difficulty ratings alone, estimates of individual abj[J.ity, 
levels and item difficulties can be derived on a Common metric. In addition, 
the general, multidimensional form of thef model may hp particularly useful in 
describing difficulty perceptions on multi-atility test batteries or other 
sucfi multi- trait instruments, ' ^ 

' ^> 
Note that P in the model corresponds to 'the number of dimensions in the 
space. If the item d^^fficulty ratings aq^e unidimenslonal,-? will equ^l 1 arid 
d^^ can be expressed more simply as > . ' 

d-. . = u .(r .-r J. f ' • [2] 



Further,' if the items ar^^ assigned unit weights, the expression in Equation 2 
becomes 



r 



' J t • . ^ 

If the model and the assumption of unidimensionality are appropriate and 
the average ability level within a^roup of testees is arbitrarily set at-zero, 
a least squares estimate of a single item's >difficulty (x .) is ^ound to be- 



1 



\ . 

where ^ is the number of persons rating the item. Tht^s, an estimate of an 
item^s difficulty Is simply the average difficulty rating assigt^ed tq that • » 
item by the ioHividual being tested. 

Similarly, a l^st squares estimate bf x^, the ability level pf person t, is 
X. » - - Z d. . + - ^ X. ' ' . . [5] 

r « 

where n is the fiumber of items adminsteted. An es*timate of an Individlial' s . 
ability level is thus the average (difficulty rating he/she assigns to a set of. 
items plus the average item-difficulty iA that set. 

/ * , ^ / 

. Acduracy of ratings-based estimates . The estimates of iteto difficulties 
and IndijVidual ability levels described by Equations 4 and 5 are based solely 
on^ the testees' ratings of relative item difficulties. In order to determine 
the appropriateness or accuracy of these perceptions, the ratings-based esti- 
mates of item dif ficultiea -and students' abilities were compared to moffe conven- 
tional estimates based on'^the correctness/incorrectness"* of the testees' conven- 
tional responses jto the test iteiis-. 
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The ratings-based estimates of item difficulty were correlatet? with the 
proportion of persons in the present study identifying the correct response . 
alternative*and also with the normal-ogive estimates of item difficulty (b .) 

'baked on the item-calibration described- in Appendix A. The r4tings-based ' 
e.stimates of student ability, were cojrrelated with trad-itional number-correct 
■scores and mayimum-likelihood ability estimates (Betz & Weiss, n976a) 
b^sed on the normal-ogive parameters of the items. - ' " 

ntmensionality of diff iculty perceptions . In order to use 'the sillple, uni- 
- dimensional; form of^the difficulty-perception model described above, the uni- 

dimensloninty of the difficulty ratings must be demonstrated. Because there is ' 
no definitive test of unidlmensionality, an indicect evaluation" was necessary. 
McBride and Weiss (1974) suggested four criteria which, if met, constitute sufficient 
•evidence of uriidimensionality in item-respoAse data. According to the criteria , ' 
suggested, confirmatory evidence of unidimensionality is present when: 1) the first 
common^ factor of the matrix' of- inter-item correlations' is a general factor account- 
• ing for a large proportion of "the aonmon variance and on which all variables load 
hjLghly; 2) the second and subsequent factors' account for muclj smaller and 
essentially equal proportions of^ the c9mmon variance; t) the item loadings on 
the first factor are either all positive or all negative; and 4) none of the above .■' 
criteria are satisfied by the analysis of a similar Correlation matrix cqaStructed 
from computer-generated randonr data. Although these criteria wei>e suggested in ' 
the context of the analysis of item-response data, they are equally applicable 
to the analysis of the difficulty rafings. 

- . • • 

Accordingly, a 41x41 matrix of product-moment .inter-item correlations 
among the difficulty ratings was factor analyzed f6r each ability group. ' ■ . 

Communalities for each item were estimated. by the squared multiple correlation' 
of that item with all others in the matrix. JFactors were-extracted by the 
principal axes procedure and the resulting communalities were substituted 
for ehe prior communality estimates. This procedure continued in. an it?erative 
fashion until the differences between the two communaUty estimates were 
nergllgible. 

Results . • 

Dimensionality of diff iculty perfceptlonS . Evidence of .the dimensionality > 
cff the difficulty ratings is shown in,- F1gures -la- and lb. -^he«e figures show 
the Jirst ten eigenvalues of the inter-item correlation matrix based on the 
difficulty ratings for the low- and high-ability groups, respectively. In both 
figures, the eigenvalues from the. analysis of the ratirtgs are repr^ented by a - ' 
sollTi line, while the dash^sd line shows those resulting from an analysis of 
comparable, computer-^eneratfed random data. 

In^th ability groups, the fir^t factor of the real data extracted by \ " 
far the^rgest amount of variance, while the second factor extracted only 
Slightly more variance than did subsequent factors. The first factors .extracted,. ' 
ftom the randoffl data, on the other hand, accoun<ted for Uttle more variance ' ' 

• than other random-data factors. The amount of variance^xtracted by the second 
and subsequent factors in the real data was similar to that extracted by the 
second and Subsequent factors in ttie rahdoih dat«,' 

O ■ , • ' 
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Fifeure 1 ^ 
Factor Contributions as a Function .of Factor Number for 
the Difficulty Ratings and for Comparable ^nd am Data 
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(b.) High-ability Group 
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^ " f aile 1 listsAhe l<^adirigs of the items frdm each test, on tl», first three* 

r.cfif ' 




factors^ extracted; from the matrix of inter-item correlations df^ dTf f icuXt^ 
ratings^ for th^t ^^est. 'Each of the items loaded positively on the -f ir^tj 
factor from that test's dat^, arid the first factor' loadings wer^ gei>ei?tfn 
These clata therefore suggest the existence of ^ "general'* factor. Also sS^' 
^'in Table 1 are 'the loadings foV the first -*t*hree factdlrs, f rom the comparable 

random ^^^^q|y|^h group. For fhese latter dataV the first factor was \ • 
j bippJ||a|^^HM|H|^oups; 1*^.^ positive ^nd -negative logdingls occurred as^ ^ 
fre<iuent^(PPtnBfii;st' factor ason factors 2 .and 3. /Ift the. real data,' 
bipolarity occurreci only oh the second and subsequent flctors. Xhe-se -result^- 
therefore suggest that for both ability*groups, the difficulty, ratings nvay ^ 
be characterized as being unidimen«lonal^ ^ ' • '^eF'* 

. / • 

^ . ' Accuracy of ratiggs-based estjfcates , Be<?S^se the diffi<iulty perceptions. ' - 
appeared to be \in*idimensional, the difficulty "ratings' were u^ed in conjunction ' . ^ 
with Equatibns 4 and 5 to calculate ratings-based estimates of Item difficulty ^ 
. {x.) and testee ability The estimates of item difficulties, based ?ol,ely 

on the difficulty ratings, are shown in Table' 2, Table 2 al4t|' shpws- propprtitjn ' ' 
^correct ip.) and normal-ogive {b .) item-difficulty estimates 'for each 4tem. "f* 



In the low-abill^ group, estimates of item difficulty derived from the ' ^ 
> difficulty perceptlpn^were. highly related .to proportion-correct and normal-' . 
ogive item-dif fictiLty estimaftes; Pearson/product-moment • correlations were 
r=-.86 and r=.80, respectively. 'The relationships between the ratings-based 
difficulty estimates and the estimates based on cohventional responses toithe , - 
items were sitailarly higK for items in t)*^ high-afeility grl|^ wft^ fespefet^ve 
l^earson product-raSnent correlatioas '^^f ^♦s-. 9^ and r=.85< •fli^" 

Appendix Tabl^ B-2 shows, for e^ch testee, number -correct scores (n^) 

and maximum likelihood estimates of the testee' s , Ability ^level (0.) based on 

his/her conventional 'Te^potises to the items an^ the corresponding ability' 
. estlmat^es b3se| on tbe difficulty perceptions (x\) . The Pearson product-moment 

^ correlation^ of' the ratings^ased ability estimates with, the corresponding 
number-correct Scores "ancl with* maximum- likelihood ability estimates .were 
r^. 55 and r=.56, respectively, for testees in the low-ability group. For pers<ms* 
lth;the'hlgh-abi|,lty gr^oup, cbmparable correlations were r=.'63. and 2»-.59, 
respectively. * ' ' 



Difficulty Perceptions of Individtiial Items 

The second phase of the analysis assessed the rei'ationship He^weea the 
ability levels of testees and the perceiyiad difficulty^ of a given item. As an 
ind^viduaf's afcility leVel increases relative" t(^ the difficulty le^el of an 
item, ^the item should be perceived by the individual as being relatively less 
difficult. As. student ability levels decrease in comparison to an item's 
f^diff iculty!, the item 'should^ appear to the testees as being relatively more 
. difficult. Thus, the difficulty gating assigned fty a testee to an individual ^ 
, ittm should be 'de^jendent upon the discrepancy between the testee's ability 
level and the item's^ difficulty. 



Table I ^ • 
Item Loadings on the First Three Factory for the 
Diff Icuitv-Perceptlon Data and for Comparable* Random Data- 
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Tabre 2 



^ . Least-Squares Item Difficulty Estimates Based oTi the, 

•Difficulty Perceptions (x.) and Corresponding Proportion-Correct (p.) 

and Normal-Ogive ^(b-) Item Difficulty Indices^ 
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\ Table ^ ' . « 

Correlations of Difficulty Ratings 
with Ability-Level/Iteip-Difficulty Discrepancy 
(r) and Dichotomized Item Scores {v^. ) J 
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Meth od of Analysis * ; ^ " 

. ^ ^ / . ' > . - 

-vnu / ^ ^ ' I ' ' 

^The normal-ogive t^^ting model penydts the estimation of individual abili£y 
levers and item difficulty levels on a common metric. Thus, an^stima^e ofi^he 
discrepancy between an individual's ability level andean item's difficulty is 
' ^i^^J* where 9^ repres^ts the ability level of petson t, and b . represents the 
difficulty of item j\ i ' ^ ' , ' 

■ ■ ' I . ' . • . . ■ 

To assess the reljiationship between, the '^bility-leve'l/iteitt-dlfficuity 
discrepancy and the testee's d'i/ficulty perception fgr a'sing-le* it 

W^-'-) the JearsOT -product-moment correlation (r)" between d .-b\ and d.. Was 

computed for each itefti^. Because the estimate of 9. and the estimate of b. 

are fallible aud because it is possible^ that testees' perceptions are nibre 
ditectly related to^whethel: or not they can answer the item correctly than to 
^i^^j* the biserial correlation' (^2?t5^ between the tes tees'' item scores^ 
(0 If incorrect, J incorrect) and their difficult perceptions wa6 also cdmputad 
Results . ^ J ^ 

Table 3 shows the ^correlations of the 9^.-i^- discrepancy and the difficulty 

ratings, ., for items on both tests. The median correlations were -.34 for 

V . ^ • ' ■ ' 

the Jow-ability group and -^^93 for the high-ability group: 'Correlations 
ragged from -.56 Ukj^-.03 f/r the low-ability group and from -,50 to -.11 for 
the .high-ability group. ' ' . 

Table. 3 also shows the biserial^correlatioris of the'item scorep an^ the 
difficulty ratings for each test item. \ The mediaa^biserial correlations were ^ ' 
-.40 and -.4B for the low- and high-ability groups,' respectively. These 
-correlations tanged ftom -1.00 to .20 for the low-ability group and from -1.00 • 
to .22 for the high-ability group. » ^ ^ * 

Perceptions of Appropriate Item ^Difficulty 

Adaptive test4^igl)roced^|^ generally tailor a test such that item diffi- 
culty^ pa¥ametersare^hiij|^^^fctr the esti^ted ability level for a given 
testae, i.e., so that Q^^^«BkF3^*^ zero. Although these items may be 

l^'about right in difficulty from a psychometric standpoint, theyr may not ^e 
"about right" from the infj^ividual testee's point of view. The Jthird »phase of 
the analysis was designed to determine the t'estee-ability/item-dif f iculty 
discrepancy f^r an ^tem which was perceived by the testee as being "jw^t dbput 
Z^aht " for him/her. ' . " c " * 

Method of Analysis ^ , k ^ 

For each t^st item, an average 9^-2?^ was computed for those person^ giving 

the item rating of '\C'\ ihdicating that they percViyed the difficulty of the 
item as "juet apout fright " for them. # , * - 



• Table 4 3haw$ the Average 0.-6 . discrepancy of subjects assignit^ to the item 
. , ' - ' ' ♦ *^ ^ ^ 

fQr each of the it«ms on the ^wo tests'. It is obvious from. Ae data in Ta51e*>A that' 

t\iei"about vight'^ perceptions, differ greatly fronj, item Ibo item. • • V 

Positive values of these mean discrepancies indi^te that an item' was 
f^ceived as ^ahaut inght" when the difficulty level^^? the item..(2y4 waa,vDn. 

on the average, below the testees' estimated ability , level (9.). . For the low- 

ability group, 28. of the 41 items had positive .mean discrepancies; ti^jese 

discrepancies ranged ftom^34 to* 5.77. For the high-ability group, 20 of the 

41 items had positive mean discrepancies, ranging from . jL4 to 4.p4., " ^ 

Negative values indicate a judgment of "about right" for items ifrhich are 
above a tr^^ee's ability level. For the low-ability group, these -ranged from, ^ 
-.31 to ^2.04. For the bigh-a"bility group, the range was -.06 to ^2.44. 

The^average- signed mean discrepancy was 1.358 for^ the low-ability teptees 
and .2899 for' the high-ability testees. These averages are somewhat ambiguous 
because differing lumbers of testees contributed to the computation 'of means 
for individual items. The pverall mean discrepancies judged to he\^^aboUt 
nghV\ weighted by the number of persons upon whiM^each item mean was based, 
were 1.703 and .466 for the low- and higji-ability* grouifs, respectively. 



Discussion 



'Lea'st squaree^stimates of item difficulties, based^ on the diffiptilty 
ratings assigrie^j to the items and unidimensional difficulty-perception . 
mpdel, were closely related to difficulty indices based on conventional 
responses to the items. Thus, students were able to 'accurately perceive 
the relative difficulties of a set of test items. There was some suggestion 
Jn the data that high-ability testeea perceived it^ difficulties relatively 
more accurately than did low-ability testees. * ^ 

Similarly, , ratings-based ability estimates corre^onded relatively well ^ 
yith more traditional abiUty estimates. Because thilse ratings-based .ability 
estimates were essentially an average of the/ difficulty ratings assigned to the 
'items, the po^it^ve correlations between these est^yiat^s and^ for instance, ' 
'the number-coTrect scores indicate that as ability levels increased, the items 
were rated as being relatively less difficult, on the average. 

The, correlations^ between jthe ratings-based ability estimates and the number- 
correct scores also indicate that testees can, with a fair degree of accuracy, 
perceive how well they have performed on an ability test. The correlations of 
.55 for the low-ab^^lity group Suggests that students in this grdup were slightly 
less able to perjceiv^ their ability levels as assessed^by number-correct scored 
than were testees in, jthe high-ability group, where numl^^r-correct scores and 
ratings-based ability estimates correlated .63. In general, however, the 
magnitude af the relationships between the d^fTiculty ratings and objective / 
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' Table 4 * . • 

Mean Signed Discrepancy by Item -Between T^stee Ability ind 

Iten Diffictiity (O^'^p for Students Rating - ' \- 

an Item "Just About Right for [me]," for Two Ability Groups . • 



Low-Ability Group. 


! - 

0 


> 


High-Ability Group 






* Item 
Reference 
Number . 


Mean ^ 
Discrepancy 




X tern 
iverer^nce 


Mean 
Discrepancy 


Humbeir of 
Students 


^ 2 


s2*.87 


50 




3. .-38 






60 




4' 


4.63 ' 


4* 


7 


1.52 






47 






1.24 


36 


14 ' 


. -1.68 


• 


« 


51 




14 




46 


. 18 


4.04 






58 






^^3 


53 


19 


3.29 






39 




19 




42 


23 


3. 16 


> 




' 61 






4 03 




24 


1.85. 






43 




• r ( la^ 23 


Ql7 


54" 


39 


3.29 






76 




'-''^24 


1 


46 


4*4 


d.l5 






101 






L \l 


^ 50 


■ 51 


♦ .79 






90 




41 . 




. .49 


56 


-.06 






59 


i 




7S 


52 


64 ' 


1.77 






t, 

34 




SI 


• JO 


49 •» 


68 


2.01 




-* 


82 




55 


3 


■ - 60 ° / 


77 


2.96 






76 






- 7S " 


•35 • 


86 " 








60 




62 


4.00 


• J8 


• 91 ; 


• . -.29 






73 




64 » 


1 37 


39- 


' , 104 


' ■ .14 


• 




32 




/ , 68 


1.46'* 


53 


108 


.85 






78 




' 72 


5 13 


• 42 ' 


• 111 


' ' -.88 






48" 




77 




60 


114 


-.87 






48 




^ 78 ' 


3.88 


62 -3 


115 


-K85 


r 




11 




86' 


.61 


■37 


■ 120 


-1.92 






88 




'89 


• 1 . 69 


51 


137 


.42 




0 


3L 




91 


-.82 


' 53 ^ 


145 


-.26 






J 7 




108 


♦.34 


54 . 


147 ' 


-1.-80 






84 




111 


-1.49 


32 " 


154 


-. 15 






95 




• 114. , 


-1.25 


' 32 


162 . ^ 


' -.75 






26 




141 


.50 


55 


167 - 


-2.44 






51 




145 


-.73 ' 


43 


174 ' 


V . -1.37 






.46 . 




■ 154 


-■59; 


63 ■ 


182 


3.16 






55 




• 162 


-1 AS 


14 


188 


" .29 






32 • 




IZA 


-2^4 




. 191 , 


.94 


\ 


— ^ 


73 




182 


-f 9 on 


61 • 


217 


-1.31 




V 


38 




188 


-.31 


. K 


253 


' -1.99..^ 






27 




191 


. .46 


49 


302 


-.96^' 






40 




192 


5'. 77 


47, 


• 319 


-1.59 






29 , 




198 


- 1.59 . 


57 


- ' 3L371 


-l.«59 






63 




302 


jl/-1.22 • 


, • 29 


359 


-2.35 










. '337 


. -1.66 


35 ■ 


375 


''-.62 






15 




375 


-1.37 ~ 


11 . 


383 


^'-1.20 






■49 




651' 


-1.59 


2'9 


514 ■ 


, -1.62t 






56 




Mean • 


\ r.36 






.29 










S.D. 

'weighted Mean 


2.26 






1.84 










1.70 • ' 




' 4 


.47 










O S.D. 
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2.28 




20, 


2.05 
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•estimates of item difficulty and between the ratings and estimates of testees* 
abilities Indicates that testee perqeptlons 6£ test difficulty and their 
test pei;formance are, at least generally,- accurate. . ^ 

The second phase of the analysis showed that fqr an individual item, 
however, there was relatively little relationship between testee perceptions 
of item difficult^UAnd testee-ability/item-»diff iculty discrepancies or the item 
scores. The mediati rproportions of variance accounted fo^ by the linear rela- ' 
tionship between tfhe discrepancy and the difficulty percept io'^is (r^) 

were only ^12 and .l*^fo*|:he two ability groups. The median p^roporti^ms of 
variapc^ accounted foft byl|{ie relat'ionship^belween th^ dichotomized item scores ' 
and the difficulty pelcepttons ^^^^^ were ,.16 and .23 f6r tfie two groups. 

In these latter data,Biowe»er , there again seems to be a difference in favor of 
- the high-ability grow in that their difficulty percepti6ns were more highly 
' i^elated to their testJWehalj^br . . * , ^ ' 

^ The, finding most 'relev^ ant for the design of ability-testing pi^ocedures was 
..thi|.t items which wete judged by the' testees to be "abaj^ right" ^±n difficulty 
\^ere not necessarily "aiout rigjit" trom a psychometric^ point of view. .These data, 
in fact, shoy that testees ^rceived items that were spmewh^t below their' ability* 
le^l^ as being,. on the average, abput right for persons of their ability l^e^. 
In tsjvfe case of the low-abilit^ studients, the itenis perceived as appropriate had, 
on jArt^WfiMrage , normal-ogive di^iSfclty parameters which were over 1.5 standard , • 
deviaxions below the testees* m^fecimum Likelihopd ability estimates. The high** 
abiliGf*,studen;ps ^udg^d items as "about 'vidfht" if, on the average, they were * ' - 
about on^-Katf standard deviation below their ability levels. Low-ability ♦ 
students tend^ed to judge items as "about; right" in difficulty when ^^he items 
were below their ability levels; the high-ability students divided their "about ^ 
right" judgements equally between items which were'psychometrically too easy -^d 
those whic^ were psychometr£cal^ly too difficult. ' , ^ 

Conclusions ' , / . 

These data show that students' perceptions of the relative difficulties of 
a set of ability te^t ittuns are quite accurate, h\x^ that their perceptions of 
the difficulties of individual abilify-test items « a re only moderately accurate. 
The data alsp suggest that the ibillty level of the testee has some effect on 
difficulty perceptions. Ability I'eVel also is related to the accuracy of^ 
perception of a testee 's own test score. Thus, testees of different ability 
levels seem to Encounter a diffeil-ent psychological en^jlronment when interacting 
with an ability' test. This conclusion is further supported by the students' 
perceptions of the items which are "abouii right" for their ability levels. 

The psychometric and the psychological effects of adapting an abij.ity test ^ 
to a level where the testee perceives the test /iifficulty as "about right" 
should be tftudied. -Adaptive testing strategies usually tailor a test such' that 
thfe estimated difficultly of e^ch item administered is close to the current ' 
estimate of an individual \s ability level. In adapting a test to ensure that 
item difficulties are^psychometrically optimal, these^ strategies may also, in 
effect, be tailoring the test so that all of the items are perceived by testees 
as beltig too difficult forepersons of their ability level/ The psychological 
effects of such a procedure should be investigated more fully.. 
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APPENDIX A 

Item Calfbrgition Procedures 



'Initial ltei4 jatameter Estimates 



The iterf l^ameterfzation, procedures that were used assumed a normal-oglVe 
latent trait" ^el and- the existence ,pf a bivariate-normal joint-distribution 
fit 8 ijeye^a li the latent ability) and x {the contlauous variable assuaed to 
UHderl£e=^*^hotompui Ite* re%^a8es). "(^ven 'tlfese assu^pti'ons, discrimina- 
tion (a) an<^ difficulty (&) patameters may be defined by Equations 6 and 7/ 

. . . '• * 

where p^^ is the collation between individuals' ability levels (e) and 'their 
scores (x) on item j. J * 

Y^. is the- 3-score above which lies the proportion of testees in the pop- 
ulation knowing the correct answer to item j (Lorcl & Novick^ 1968). - 

In order to estimate the Tjiserial correlation (r .) between testees' 

' " " "J ' 0- 

ability levels and their dichotomized item scores was found by first estimating 
the point-biserial correlation (r ) between ability levels and dichotomous item * 
scores by Equation 8, iased .^n data reported by.McBride and Weiss (1974), 

where is -tte mean number-correct score of persons correctly answering, item j\ 

is -the mgan number-correct scor^ of persons incorrectly answering item'j, 
p^.^iis the proportidn of persons correctly answering item j. 



V 

is the standai:^ deviation of number-corrtect scores for the t^tal 
group answering item j. ' ^ 
The biserial coefficient was then computed usinfe the tr^formation in .Equation 9 

ERIC - , • . 



yhere* 2 . is the 2-score ' above which lies the projpor^ibn of testees in the 
* ^ norming sample correctly answering* item j tp -) > *N 

(j)[2 .] is the density of a normal {)robability density funcf ion at 2 

J . - , t 0 

Because a testee could^ answer an it^ correctly simply by ^random guessing 
on these 5-altemative, multiple-choice*^ items, a guessfng,, parameter (c) was ^ 
flefined for each 'item by Equation 10. • « • ' 

. ' .... .1 [10] 

.^3 ^ . • y 

where n^'i^ the number of respon^^ alternatives on item 

In-order tfi account for guefesing when the initial a. and b .p4|;ameters used 
construct the tests described in this report were derived, the estimate of 
(r.) computed in Equation 9 was modified according to Equation **H, 
* ' 

T. ■'• ■ . • •.. . 

■-•r>ya-..). • _ . . [Ill 

The estimate of , resulting from Equation \\ (rO was restricted to . 

the interval from -1.0 to +1.0 and used, along with 2. (as an estimate of Y .) > 
r 0 » J \ ^ 3 

to^calculate values ^fj? and b for each item using Equations 6 and. 7. The 

resulting values af 2z . were then restricted to the interval from -S-O +3.0, 

3 

^iThe restrictions on and a. thus affected both the values of the a and 
parameters' but the effects -of the restrictions ware not necessarily (^opsistent. 

Revised Item Parameter Estimates / ' ' 

. / ' 

The item parameter estimates derlv^ed^ from the above procedures were used . 
to select items for the tests administered in this study, "In the -time intetval 
between "the construction of the tes^s and the analysis of the data, it became 
appar^t, -that certain revisions to th'fese item parameter estimat€js were necessary 
for each item. These revised estimates were' computed for all 569 Items in the 
pool from which items for -this study were seleof ed. 

In computing ,tfl^ revised estimates of a and h used to analyze the present 
data, the pro'portidn of*"^€8t§es who actually knew the correct answer to an item 
(p') was estimated from the ;t)roportion of testees in the population who actually 
J > ' ^ . • 

answered the item Correctly (p.) and the estimate of q using Equation 12, 

* • - , J • , C * ^ 

pj- (Pj-c^O/CJ^o^.). . ■ [12] ■ 



25 . 



/ 



-20' . 



An efcimate of p^^' sugge^ed by Urry (19V5) was then conrputed by Equation 13, 



s, .7 
' r . \/Cp .}- Cl-p J 



where is the a-score above which lids the proportion of testees in the samplt 
who were estimated to actually know the andwer\o item j (p") , 

f [s'] is the density of normal probability density function at 

, This estimate of p^^ _ was then^used, "afong' with pi as an estimate of y.. 




3 

in Equations 6 and V to calculate the revised a and h par^imeters. If 'p .<c . 

was set equal to .001. * If i ^rj I >. 9486833, f'r^^ ser-e^al to .9486833 

with the appropriate sign. This restricted the a^-^lues to rhe interval from 
-3.0 to +3.0 and influenced the' fc-values thfwftgh Equation 7. 

This latter procedure^ differs froljf th^t 'suggested by Je^sena .(1976)^only 
in that Jensema chose to remove eacji item ftom the compufiatlbn of the test 
, score estimating 8 'dur1ajg ^e computation of that iteV^i parameters. For test 
scores based^ on large n^bl^ of it^ms, the ef f ects^t. this, exalision should be 
negligible. • ^ , - 

Comparisoncof Original and Revised Item Parameters .4:^-% 

• / , * ■ ■ »' 4 . 

- For items in the pool with fc- parameters l)etw6en ±3.0, Figure A-1 presents 
the bivariate jjlgt of the original and the revised b parameters. As -Figiire A-1 ' ' 
shows, the ' revised fc estimates were closlly related to the original fc-values ^ 
(Pearson product -moment i«.98). The bivariate plot of original and revised 
a-values is shown In Figure A-2/_ As yjisffigure shows, the revised a-values • 
were not as closely related-^^ t^je-orl^^l a-val#es (Pearson product-moment 
P=.74) as were t^e revised 6-vaI»jies. ^ ' * 

■ ■ • • -■'Sh' . ' . ' *. 

• ^ To determine the effect's of the revlfied item parameters on ability estimates 
computed using those parameters, maximum likelihood ability estimates were 
computed using both sets Of item parameters for ^he'l85 CLA students involved 
in this study. ^The bivariate plot of the^±|ojets of max*imum likejfthood ability 
• estimates is shown in Figure A-3. The reiCr». Pearson prodUct^moment corre- 
lation of .96 indicated that the abi^ty estiiStes did not differ greatly depending 
on whether the original or revised normal-ogive item-parameter estimates were ■ . 
used. This-h-igh correlation suggests that essentially the. same conclu8lQn'»^J:'#*^- 
would be drawn in this study from the u.se of either tWrljri^inal set oft^wr'"''^^ 
parameters or the revised set of parameter estimates on«lJrrV's (1^5) 

correctio* procedure. 



.These procedures were suggested by James Sympson ol the UniverljTt^ of^ ^-^^^ 
Minnesota. ^ / ' * . ^ 
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Figure A-1 

Distribution of Original and Revised Difficulty 
Patameter ib) Estimates , . 



2* 



62 3 
2HH* 3*2 
2*»33^2* 

♦♦2 

3' 



♦2 ♦ 

• 65*2 ♦ 
3211 



* * ♦ 



♦ / 

I 

/ 



I 



-2. -1. 



/ 1. 



2* 



3» 5» 



. Revised fe Estimate 



Joint Distribution qf Origilial and Revised Discrimination 
• Parameter , (a) Estimates 



• • ^ 



2 2»« • • • 
b»« ^22 • • • 

• • 2 *22«-,« • , 

^23 H«« 2 • • • ^ 

^•u3»**2 • • - 

• 2»»2*2« -i- • ^ ^ * 

• (.223 • 2, • • , 

2*<3 3* ^2 ^2 ^ , ^ f • 
UH3«»3»»« • ^ 
b43«2^2# * * 

2ii<i2*#» ♦ • , ^ • 

223 ' 2*«« • 



2« 



2* 



lUvised a Estimate 



• Figure A- 3 , ' ^ 

Joint Distribution of 'Maximum-likelihood ^Ability Estimates (0) 
Based on the Original and the Revised Iteu-parameter Estimates 
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APPENDIX B 



Table B-1 

Order of Administration and Normal Ogiv^ Discrimination (a) and 
Difiiculty (2?)' Parameters for I^ems on Tests for the Low- and lligh-Abllity Groups. 



Item Reference 
Number 



Low-Ability Group 



Item Sequence 
B C D 



Item Parameters 

a h 



Item Reference 
Number 



IK 



gh-Ability Group 



Item Sequence, 
A B C 



Item Parameters 



2 


1 1 


32 


37 


16 


.517 


-3.810 , 


2 


41 


7 


27 


21 


,517 


-3.810 


U 


24 


24 


10 


38 


. 397 


-5. 561 


7 


39 


8 


26 


22 


3.000 


-2.324 


7 


3 


3 


3 


3 


3.000 


-2.324 


14 


22 




8 


40 


2.208 


-2.461 


* 14 


40 


9 


25 


23 


2. 208 


-2 .461 


18 


1 


1 


1 


1 


.483 


-4.241 


18 


41 


7 


27. 


21 


.483 


-4.241 


• 19 


28 


20 


14 


34 


.710 


-3.808- 


19 


* 16 


37^ 


32 


I'l 


.710 


-3.808 


23 


18 


39 


30 ' 


9 


.713 


-3.862 


20 


. r 


1 


1 


1 


.381 


-5.764 


24 


30 


18 


16 


32. 


1. 749 


-2.366 




22 


26 


8 \ 


40 


.713 


— 3.862 


39 » 


5 


5 


5 


5 


^ ,.347 


-3.-625 


24 


13 


34 


35 


14 


1.749 


-2.366 


44 


32 


^ 16 


18 


30 


1.145 


-r.4l2 


29 


25 


23 ? 


1 1 


3^ 


, .323 


r5.521 


51 


27 


21 . 


13 


'35 


1.432 


-1.043 


4 1 


7 


28 


41 


20 


.272 


-6.450 


56- 


34 


14 


20 


28 


1 . 109 


. 135 


44 


15 


36 


33 


12 


1.145 


-1.412 


64 


23 


25 


9 


39 


3:000 


-2.363 


51 


34 


14 


20 


28 


1.432 


-1.043 


fe8 


15 


36 


33 


12 


1 .014 


-2.479 


55 


' 29 


19 


15 


33 


.288 


-4.953 


77 


10 


31 


38 


17 


.442 


-3.602 


56 


17- 


38 


^1 


10 


1.109 


.135 


t36 


7 


28 


41 


20 


.887 


-1.189 


62 


18 


39 


30 


9 


.42* 


-4 .952 


91 


25 


23 


11 


3r 


1.132 


- .197 




' 39 


8 


26 


22 


3.000 


-2.363 


104 


3 


3 


3 


3 


.944 


' .050 


6$> , 


6 


6 


6 




l.OU 


-2 .479 


108 


' 8 


29 


40 


19 


.536 


-1. 155 


72 


5 


5 


- ,5 


5 


.274 


1 -6. 134 


111 


33 


15 


19 


29 


.822 


.936 


77 


32 


16 . 


18 


30 


, .442 


-3. 602 


114 


36> 


12 


22 


26 


3.000 


.960 


78 


9 


30 


39 


18 


.437 


.. -4.843 


115 


2 


2 


2 


2 


3.000 


2.023 


86 


23 


25 


9 


39 


.887 


-1.189 


120 


38 


10 


24 


24 


3.000 


1 .464 


8^ 


35 


13 


21 


27 


.721 


-2.493 


137 


6 


6 


6 


6 


.499 


- .056 


91 


30 


18 


16 


32 


1.132 


- .197 


145 


35 


13 


21 


27 


.791 


.066 


108 


33 


15 


19 


29/ 


' .536 


-1.155 


147 ^ 


17 


38 ' 


31 


10 


*.825 


1.469> 


111 


19 


40 


29 


8 


i .822 


.936 


154' 


26- 


22 


12 


36 


.872 


- .124 


141 


.8 


29.. 


i*o 


19 


3.000^ 


.960 


162 - 


31 


17. 


17. 


31 


3.000 


1.245 


38'# 10 


24* 


24 .478 


'•-1.20^^ 


• 167 




24 


10 


16'' 




2.1^5 


145 


10 


31- 


38 


17 


.791 


.086 * 


174 


16 


37 
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Table B-2 



UMt-Squaree Estlaatet.of Tettee Ability bated on the Difficulty Perceptidnt (x, ) 
with Corresponding Kuaber-Correct Score* (k^) «nd lUxlBim Likelihood Ability Estijcttes (6..) 
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